incomplete data
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Florida > Broward County (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia (0.04)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Data Science > Data Quality (0.74)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
- North America > United States (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Hong Kong (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.05)
- Asia > China > Hong Kong (0.04)
- (2 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Processing of missing data by neural networks
Marek Śmieja, Łukasz Struski, Jacek Tabor, Bartosz Zieliński, Przemysław Spurek
Our idea is to replace typical neuron's response in the firsthiddenlayerbyitsexpected value. Thisapproach canbeappliedforvarious types ofnetworksatminimal costintheirmodification. Moreover,incontrast to recent approaches, it does not require complete data for training. Experimental results performed ondifferent types ofarchitectures showthatourmethod gives better results than typical imputation strategies and other methods dedicated for incompletedata.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning
Spectral clustering has gained popularity for clustering non-convex data due to its simplicity and effectiveness. It is essential to construct a similarity graph using a high-quality affinity measure that models the local neighborhood relations among the data samples. However, incomplete data can lead to inaccurate affinity measures, resulting in degraded clustering performance. To address these issues, we propose an imputation-free framework with two novel approaches to improve spectral clustering on incomplete data. Firstly, we introduce a new kernel correction method that enhances the quality of the kernel matrix estimated on incomplete data with a theoretical guarantee, benefiting classical spectral clustering on pre-defined kernels. Secondly, we develop a series of affinity learning methods that equip the self-expressive framework with $\ell_p$-norm to construct an intrinsic affinity matrix with an adaptive extension. Our methods outperform existing data imputation and distance calibration techniques on benchmark datasets, offering a promising solution to spectral clustering on incomplete data in various real-world applications.
Scalable Structure Learning of Continuous-Time Bayesian Networks from Incomplete Data
Continuous-time Bayesian Networks (CTBNs) represent a compact yet powerful framework for understanding multivariate time-series data. Given complete data, parameters and structure can be estimated efficiently in closed-form. However, if data is incomplete, the latent states of the CTBN have to be estimated by laboriously simulating the intractable dynamics of the assumed CTBN. This is a problem, especially for structure learning tasks, where this has to be done for each element of a super-exponentially growing set of possible structures. In order to circumvent this notorious bottleneck, we develop a novel gradient-based approach to structure learning. Instead of sampling and scoring all possible structures individually, we assume the generator of the CTBN to be composed as a mixture of generators stemming from different structures. In this framework, structure learning can be performed via a gradient-based optimization of mixture weights. We combine this approach with a new variational method that allows for a closed-form calculation of this mixture marginal likelihood. We show the scalability of our method by learning structures of previously inaccessible sizes from synthetic and real-world data.
Robustness to Adversarial Perturbations in Learning from Incomplete Data
What is the role of unlabeled data in an inference problem, when the presumed underlying distribution is adversarially perturbed? To provide a concrete answer to this question, this paper unifies two major learning frameworks: Semi-Supervised Learning (SSL) and Distributionally Robust Learning (DRL). We develop a generalization theory for our framework based on a number of novel complexity measures, such as an adversarial extension of Rademacher complexity and its semi-supervised analogue. Moreover, our analysis is able to quantify the role of unlabeled data in the generalization under a more general condition compared to the existing theoretical works in SSL. Based on our framework, we also present a hybrid of DRL and EM algorithms that has a guaranteed convergence rate. When implemented with deep neural networks, our method shows a comparable performance to those of the state-of-the-art on a number of real-world benchmark datasets.
Support Recovery in Sparse PCA with Incomplete Data
We study a practical algorithm for sparse principal component analysis (PCA) of incomplete and noisy data.Our algorithm is based on the semidefinite program (SDP) relaxation of the non-convex $l_1$-regularized PCA problem.We provide theoretical and experimental evidence that SDP enables us to exactly recover the true support of the sparse leading eigenvector of the unknown true matrix, despite only observing an incomplete (missing uniformly at random) and noisy version of it.We derive sufficient conditions for exact recovery, which involve matrix incoherence, the spectral gap between the largest and second-largest eigenvalues, the observation probability and the noise variance.We validate our theoretical results with incomplete synthetic data, and show encouraging and meaningful results on a gene expression dataset.